204 research outputs found

    Enabling Open-Set Person Re-Identification for Real-World Scenarios

    Get PDF
    Person re-identification (re-ID) is a significant problem of computer vision with increasing scientific attention. To date, numerous studies have been conducted to improve the accuracy and robustness of person re-ID to meet the practical demands. However, most of the previous efforts concentrated on solving the closed-set variant of the problem, where a query is assumed to always have a correct match within the set of known people (the gallery set). However, this assumption is usually not valid for the industrial re-ID use cases. In this study, we focus on the open-set person re-ID problem, where, in addition to the similarity ranking, the solution is expected to detect the presence or absence of a given query identity within the gallery set. To determine good practices and to assess the practicality of the person re-ID in industrial applications, first, we convert popular closed-set person re-ID datasets into the open-set scenario. Second, we compare performance of eight state-of-the-art closed-set person re-ID methods under the open-set conditions. Third, we experimentally determine the efficiency of using different loss function combinations for the open-set problem. Finally, we investigate the impact of a statistics-driven gallery refinement approach on the open-set person re-ID performance in the low false-acceptance rate (FAR) region, while simultaneously reducing the computational demands of retrieval. Results show an average detection and identification rate increase of 8.38% and 3.39% on the DukeMTMC-reID and Market1501 datasets, respectively, for a FAR of 1%

    Cascaded CNN method for far object detection in outdoor surveillance

    Get PDF
    In maritime surveillance, detection of small ships and vessels located far away in the scene is of vital importance for behaviour analysis. Comparing to closely located objects, far objects are often captured in a smaller size and lack the adequate amount of details. Therefore, conventional detectors fail to recognize them. This paper proposes a CNN-based cascaded method for reliable detection of objects and more specifically vessels, located far away from a surveillance camera. The cascaded method improves small object detection accuracy by additional processing of the obtained candidate regions in their original resolution. The additional processing includes another detection iteration and a sequence of detection verification steps. Experimental results on our real-world vessel evaluation dataset reveal that the cascaded method increases the recall rate and F1- measurement by 13% and 12%, respectively. Another benefit is that the method does not require an adopter to change the model and architecture of the applied network. As an additional contribution, we provide a labeled maritime dataset to open public access.</p

    Performance-Efficiency Comparisons of Channel Attention Modules for ResNets

    Get PDF
    Attention modules can be added to neural network architectures to improve performance. This work presents an extensive comparison between several efficient attention modules for image classification and object detection, in addition to proposing a novel Attention Bias module with lower computational overhead. All measured attention modules have been efficiently re-implemented, which allows an objective comparison and evaluation of the relationship between accuracy and inference time. Our measurements show that single-image inference time increases far more (5–50%) than the increase in FLOPs suggests (0.2–3%) for a limited gain in accuracy, making computation cost an important selection criterion. Despite this increase in inference time, adding an attention module can outperform a deeper baseline ResNet in both speed and accuracy. Finally, we investigate the potential of adding attention modules to pretrained networks and show that fine-tuning is possible and superior to training from scratch. The choice of the best attention module strongly depends on the specific ResNet architecture, input resolution, batch size and inference framework.</p

    Qos concept for scalable MPEG-4 video object decoding on multimedia (NoC) chips

    Full text link

    Polyp malignancy classification with CNN features based on Blue Laser and Linked Color Imaging

    Get PDF
    In-vivo classification of benign and pre-malignant polyps is a laborious task that requires histophatology confirmation. In an effort to improve the quality of clinical diagnosis, medical experts have come up with visual models with only limited success. In this paper, a classification approach is proposed to differentiate between polypmalignancy, using features extracted from the Global Average Pooling (GAP) layer of a pre-trained Convolutional Neural Network (CNNs) . Two recently developed endoscopic modalities are used to improve the pipeline prediction: Blue Laser Imaging (BLI) and Linked Color Imaging (LCI). Furthermore, a new strategy of per-class data augmentation is adopted to tackle the differences of unbalanced class distribution. The results are compared with a more general approach, showing how artificial examples can improve results on highly unbalanced problems. For the same reason, the combined features for each patient are extracted and trained using several machine learning classifiers without CNNs. Moreover to speed up computation, a recent GPU based Support Vector Machine (SVM) scheme is employed to substantially decrease the overload during training time. The presented methodology shows the feasibility of using the LCI and BLI techniques for automatic polypmalignancy classification and facilitates future advances to limit the need for time-consuming and costly histopathological assessment

    Triplet network for classification of benign and pre-malignant polyps

    Get PDF
    Colorectal polyps are critical indicators of colorectal cancer (CRC). Classification of polyps during colonoscopy is still a challenge for which many medical experts have come up with visual models, albeit with limited success. An early detection of CRC prevents further complications in the colon, which makes identification of abnormal tissue a crucial step during routinary colonoscopy. In this paper, a classification approach is proposed to differentiate between benign and pre-malignant polyps using features learned from a Triplet Network architecture. The study includes a total of 154 patients, with 203 different polyps. For each polyp an image is acquired with White Light (WL), and additionally with two recent endoscopic modalities:Blue Laser Imaging (BLI) and Linked Color Imaging (LCI). The network is trained with the associated triplet loss, allowing the learning of non-linear features, which prove to be a highly discriminative embedding, leading to excellent results with simple linear classifiers. Additionally, the acquisition of multiple polyps with WL, BLI and LCI, enables the combination of the posterior probabilities, yielding a more robust classification result. Threefold cross-validation is employed as validation method and accuracy, sensitivity, specificity and area under the curve (AUC) are computed as evaluation metrics. While our approach achieves a similar classification performance compared to state-of-the-art methods, it has a much lower inference time (from hours to seconds, on a single GPU). The increased robustness and much faster execution facilitates future advances towards patient safety and may avoid time-consuming and costly histhological assessment.</p

    Barrett's lesion detection using a minimal integer-based neural network for embedded systems integration

    Get PDF
    Embedded processing architectures are often integrated into devices to develop novel functions in a cost-effective medical system. In order to integrate neural networks in medical equipment, these models require specialized optimizations for preparing their integration in a high-efficiency and power-constrained environment. In this paper, we research the feasibility of quantized networks with limited memory for the detection of Barrett’s neoplasia. An Efficientnet-lite1+Deeplabv3 architecture is proposed, which is trained using a quantization-aware training scheme, in order to achieve an 8-bit integer-based model. The performance of the quantized model is comparable with float32 precision models. We show that the quantized model with only 5-MB memory is capable of reaching the same performance scores with 95% Area Under the Curve (AUC), compared to a fullprecision U-Net architecture, which is 10× larger. We have also optimized the segmentation head for efficiency and reduced the output to a resolution of 32×32 pixels. The results show that this resolution captures sufficient segmentation detail to reach a DICE score of 66.51%, which is comparable to the full floating-point model. The proposed lightweight approach also makes the model quite energy-efficient, since it can be real-time executed on a 2-Watt Coral Edge TPU. The obtained low power consumption of the lightweight Barrett’s esophagus neoplasia detection and segmentation system enables the direct integration into standard endoscopic equipment

    Estimating Physical Camera Parameters based on Multi-Sprite Motion Estimation

    No full text
    Global-motion estimation algorithms as they are employed in the MPEG-4 or H.264 video coding standards describe motion with a set of abstract parameters. These parameters model the camera motion, but they cannot be directly related to a physical meaning like rotation angles or the focal-length. We present a two step algorithm to factorize these abstract parameters into physically meaningful operations. The first step applies a fast linear estimation method. In an optional second step, these parameters can be refined with a non-linear optimization algorithm. The attractivity of our algorithm is its combination with the multi-sprite concept that allows for unrestricted rotational camera motion, including varying of focal-lengths. We present results for several sequences, including the well-known stefan sequence, which can only be processed with the multi-sprite approach

    Evaluation of a Feature-Based Global-Motion Estimation System

    No full text
    Global-motion estimators are an important part of current video-coding systems like MPEG-4, content analysis and description systems like MPEG-7, and many video-object segmentation algorithms. Feature-based motion estimators use the motion vectors obtained for a set of selected points to calculate the parameters of the globalmotion model. This involves the detection of feature points, the computation of correspondences between two sets of features, and the motion parameter estimation. In this paper, we will present a feature-based global-motion estimation system and discuss each of its parts in detail. The idea is to provide an overview of a general purpose feature-based motion estimator and to point out the important design aspects. We evaluate the performance of di#erent feature detection algorithms, propose an e#cient feature-correspondence algorithm, and we compare the di#erence between a non-linear parameter estimation and a linear approximation. Finally, the RANSAC based robust parameter estimation is examined, we show why it does not reach its theoretical performance, and propose a modification to increase its accuracy. Our global-motion estimator has an average accuracy of 0.15 pixels with real-time execution
    • …
    corecore